Skip to content

Conversation

@jasonleenaylor
Copy link
Contributor

@jasonleenaylor jasonleenaylor commented Apr 8, 2025

  • There was a valid class of combining character categories related to extended whitespace not yet handled

This change is Reviewable

* There was a valid class of combining character categories
  related to extended whitespace not yet handled
@github-actions
Copy link

github-actions bot commented Apr 8, 2025

LCM Tests

    16 files  ±0      16 suites  ±0   2m 54s ⏱️ -1s
 2 830 tests ±0   2 810 ✅ ±0   20 💤 ±0  0 ❌ ±0 
11 268 runs  ±0  11 100 ✅ ±0  168 💤 ±0  0 ❌ ±0 

Results for commit 998ae86. ± Comparison against base commit 4f1aa51.

Copy link
Contributor

@mark-sil mark-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jasonleenaylor)

Copy link
Contributor

@hahn-kev hahn-kev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do all these magic numbers mean?

Copy link
Contributor Author

@jasonleenaylor jasonleenaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic comes out of unicode normalization ordering that I have not bothered to dig into. While debugging I identified that calling the unicode function for these whitespace characters was returning a 2 which was not supported by the existing code. I couldn't find any resources that explained what '2' should mean, or even that it existed as a valid combining class. I eventually decided that if unorm2_getCombiningClass() could return 2 for these characters the initialization of the mapping ought to support it. We don't actually care about those values in our current uses of the code. An alternative solution would have been to use try/catch or TryGetValue for all dictionary access in UcdProperty but then I would have to pick a reasonable default behavior for each detail.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jasonleenaylor)

@jasonleenaylor jasonleenaylor merged commit 27a891c into master Apr 14, 2025
5 checks passed
@jasonleenaylor jasonleenaylor deleted the LT-22032 branch April 14, 2025 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants